![]() | Understanding the data | The induction of a tree | ![]() |
In certain cases the outcome of the analysis is well defined and understood; for example the outcome may be credit worthiness, diagnosed faults or similar observed events. The objective of analysing the data in such cases would be to develop a tree relating the outcome to other observed attribute fields. The resulting tree would then help in understanding the relationships between the outcome and the attributes. In addition, the generated tree can be used as a data model to predict the outcome given the attributes.
In other cases, the data we are analysing may not have an obvious outcome field. In such cases, the objective of the analysis would be to experiment with different outcome fields to test any relationships between fields which will allow one or more fields to be profiled using other fields. As an example, consider a database consisting of records of applications for mortgage loans. The data fields may be 'occupation of applicant', 'occupation of spouse', 'salary', 'time in current employment', 'type of property', 'property value'. Since this data is applications data only - without any obvious observed event such as 'arrears in making payment' - it follows that there is not an obvious outcome. However, there may be many patterns and relationships that can be discovered from such data. For example, we can search for patterns relating the 'type of property purchased' or the 'property value' to other data fields. In this case both 'type of property purchased' and 'property value' can be, in turn, considered as outcomes.